Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/2569 Analysis API: compute effective sample size #2575

Merged

Conversation

roualdes
Copy link
Collaborator

@roualdes roualdes commented Jul 9, 2018

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Ran tests mentioned below and ./runTests.py src/test/unit/mcmc

Summary

Create new route for unified calculation of effective sample size. Discussion on Discourse and in issue #2569.

Add function src/stan/analyze/mcmc/compute_ess.hpp along with tests.

Add method on chains class in src/stan/mcmc/chains.hpp in an effort to minimize changes necessary for adoption into CmdStan.

Intended Effect

Add unified API for interfaces to compute effective sample size, and maintain backwards compatibility during the transition to the new route.

How to Verify

Tests live in src/stan/test/unit/analyze/mcmc/compte_ess_test.cpp. Run

make clean && ./runTests.py -j2 src/test/unit/analyze/

Side Effects

None intended.

Documentation

Inline, via doxygen.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): California State University, Chico

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@roualdes
Copy link
Collaborator Author

roualdes commented Jul 9, 2018

Please note my use of a function template on compute_effective_sample_size. With the template, no interface should need to make copies of draws. Without it, CmdStan will need to make a copy. My understanding is that CmdStan and RStan will have no problems instantiating the templates they need:

CmdStan is able/will instantiate compute_effective_sample_size(std::vector<const double*>, std::vector<size_t> sizes) to reuse the Eigen data structures.

RStan is able/will instantiate either the above or the below signature, as I don't think Rcpp is as unhappy about const as cython is.

Is PyStan able to instantiate compute_effective_sample_size(std::vector<double*>, std::vector<size_t>)? Or could we get away with Stan instantiating it for PyStan, somehow?

@ariddell, would you please opine your ability to deal with my use of a function template?

@riddell-stan
Copy link

That signature looks fine to me. I don't think there should be any problem calling that function with Cython from PyStan.

@roualdes
Copy link
Collaborator Author

10/11 travis checks were successful. The one error just says no output received in the last 10 minutes. I can't find more details on the error. Let me know what I can do to help.

@bob-carpenter
Copy link
Contributor

Thanks for the heads up. I restarted the job.

If you don't have credentials for kicking Travis, @seantalts should be able to set you up.

Copy link
Member

@syclik syclik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest thing to change here is the implementation of chains. It shouldn't have picked up a new method. Instead, the old method should have dispatched to your new function.

* @param std::vector stores sizes of chains
* @return effective sample size for the specified parameter
*/
template<typename T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this templated? It looks like on the issue and looking at the implementation, this can be removed and the type of the input be double *. I don't know if it needs to be templated for something like RStan to use well, but if so, please document it.

If leaving the template parameter in, please add the template parameter documentation using @tparam. See: https://github.com/stan-dev/stan/wiki/How-to-Write-Doxygen-Doc-Comments#function-doc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking, as I was hoping for feedback on this. I addressed my reasoning behind templating this function in a comment immediately following this PR's initial comment. I've come to learn that you'd prefer such details/requests for feedback in the PR's initial comment.

The short story is that CmdStan won't have to make any copies if it can use compute_effective_sample_size(std::vector<const double*>, std::vector<size_t> sizes), but everyone agreed to compute_effective_sample_size(std::vector<double*>, std::vector<size_t> sizes), for which CmdStan will necessitate a copy. In an attempt to satisfy all requests of our previous discussions, I templated this function.

If RStan and PyStan can both call either signature, then I'd recommend we change the type of the first argument from std::vector<double*> to std::vector<const double*> and drop the template.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up! We should use const double *. I think it should actually be const double * const. The pointers don't change and we never mutate the elements.

CmdStan should be able to handle it. We'll figure out how to make RStan and PyStan work with that signature, even if it means casting the consts away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up! We should use const double *. I think it should actually be const double * const. The pointers don't change and we never mutate the elements.

CmdStan should be able to handle it. We'll figure out how to make RStan and PyStan work with that signature, even if it means casting the consts away.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The const double * const is a good thought and I agree with the reasoning behind it, but I don't know how to deal with that type here. Wouldn't this necessitate initialization at declaration, since it won't allow modification later? For instance, I don't see a reasonable way get the rvalues of line 616 in src/stan/mcmc/chains.hpp into the (would be) container std::vector<const double * const> draws.

Eigen::VectorXd chain_mean(num_chains);
Eigen::VectorXd chain_var(num_chains);
for (size_t chain = 0; chain < num_chains; ++chain) {
Eigen::Map<const Eigen::Matrix<double, Eigen::Dynamic, 1> >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With C++11, we don't need to have the space between the two right angle brackets (> >). Please remove the space between the two.

@@ -0,0 +1,100 @@
#ifndef STAN_ANALYZE_ESS_HPP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the file to match the function name: compute_effective_sample_size.hpp.

Please update the header guard to match the file structure exactly: STAN_ANALYZE_MCMC_COMPUTE_EFFECTIVE_SAMPLE_SIZE_HPP.

template<typename T>
double compute_effective_sample_size(std::vector<T> draws, size_t size) {
size_t num_chains = draws.size();
std::vector<size_t> sizes(num_chains);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot easier with C++11! This can just be:

std::vector<size_t> sizes(num_chains, size);

and you can remove the loop!

@@ -701,6 +702,20 @@ namespace stan {
return effective_sample_size(index(name));
}

double compute_effective_sample_size(const int index) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... this isn't what should happen. Remove this function signature. Replace the existing class method, effective_sample_size() (line 230), with a dispatch to the function you've written.

We'll replace each part of this class in that fashion where all the work is done outside. Then remove the class itself later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up chains.hpp as requested.

Should I also remove the effective sample size tests in src/test/unit/mcmc/chain_test.cpp, so as to rely on the tests in src/test/unit/analyze/mcmc/compute_effective_sample_size.cpp?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be ideal! Thanks! The test in chains should just check that it's callable, not really check for the validity of the value that's returned.

If you don't want to do that, no prob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be ideal! Thanks! The test in chains should just check that it's callable, not really check for the validity of the value that's returned.

If you don't want to do that, no prob.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE check that it's callable. Is something like this satisfactory within src/test/unit/mcmc/chains_test.cpp?

EXPECT_NO_THROW(chains.effective_sample_size(1.0))
    << "calling chain.effective_sample_size(index = 1.0).";

@roualdes
Copy link
Collaborator Author

Thanks for the feedback. Please give me some time to address your comments as I don’t have access to a computer until Tuesday.

@syclik
Copy link
Member

syclik commented Jul 25, 2018

No problem! Sorry for the delay on getting it reviewed. Hopefully we can keep up with the queue a little faster.

@syclik
Copy link
Member

syclik commented Jul 31, 2018 via email

@roualdes
Copy link
Collaborator Author

roualdes commented Aug 2, 2018

I'm trying to debug this error from jenkins, but failing. The error message on jenkins provides enough for me to locate the error -- something following

bin/stansummary src/test/interface/matrix_output.csv

but not enough info to move from there. I would like to recreate the above command relative to this branch. When I execute the following, this branch isn't available.

git clone --recursive https://github.com/stan-dev/cmdstan.git
cd stan
# git checkout feature/issue-2569-analysis-api-ess # branch not available

Is this branch not available because it's coming from my fork of stan-dev/stan? Are my attempts at git off base? Is this idea possible?

@syclik Any other tips or suggestions you might have to help me move forward would be much appreciated. Thanks.

@syclik
Copy link
Member

syclik commented Aug 2, 2018 via email

@seantalts
Copy link
Member

When you click on the failing build from github, you get to this page. On that page you can see that the upstream tests failed. At the bottom of the page is a link to the failing upstream tests, awkwardly named "downstream tests" (the red X should draw your attention). Clicking that brings you here, which is where Daniel's link goes. Looks like the interface tests failed on all platforms (we should maybe stage those so they aren't all executing in parallel, haha).

@roualdes
Copy link
Collaborator Author

roualdes commented Aug 2, 2018

Thanks to you both for your quick responses.

I (previously) made it as far you both suggest.

The error comes when the test, named CommandStansummary.functional_test__issue_342, issues the following command: bin/stansummary src/test/interface/matrix_output.csv. I don't understand why this command is failing.

As I see it, either my code isn't building correctly and thus CmdStan can't find the function stan::analyze::compute_effective_sample_size, or something is wrong with my code.

In an effort to learn why this fails, I'd like to run the same CmdStan command on my personal machine: bin/stansummary src/test/interface/matrix_output.csv. My last comment offers what git commands did not work for me. Is there any git magic I can use to build CmdStan from this (forked) branch of stan-dev/stan?

If this thought process is off target, please don't hesitate to say so.

@seantalts
Copy link
Member

I see, sorry. There are failing tests and that's why the command quit with an error.

re: testing CmdStan with your branch - Cmdstan has Stan and Math as submodules. Clone your fork of cmdstan with --recursive and then cd stan; git checkout <my_branch>. If you want to work from our fork of CmdStan but your fork of Stan, you can add your fork as a remote.

@roualdes
Copy link
Collaborator Author

roualdes commented Aug 2, 2018

That did it. Thanks @seantalts for the tip about remotes. I'll report back when I have something to offer about this error.

@roualdes
Copy link
Collaborator Author

roualdes commented Aug 2, 2018

The error: I had previously used size_t frequently, instead of int. This caused problems for the code chunk following the comment "Geyer's initial monotone sequence" in src/stan/analyze/mcmc/compute_effective_sample_size.hpp. According to the Google style guide, even though size_t is not disallowed, int is much preferred.

The fix: replace size_t with int in most places.

Note that neither stan-dev/stan:develop nor this PR's tests caught this. Is it worth adding a test?

@syclik Before this gets final approval, did you see my comment about the types const double * const and const double*?

@syclik
Copy link
Member

syclik commented Aug 3, 2018

Sorry -- I missed it the first time around. Here's the last comment:

The const double * const is a good thought and I agree with the reasoning behind it, but I don't know how to deal with that type here. Wouldn't this necessitate initialization at declaration, since it won't allow modification later? For instance, I don't see a reasonable way get the rvalues of line 616 in src/stan/mcmc/chains.hpp into the (would be) container std::vector<const double * const> draws.

Please ignore the const double * const suggestion. I think it's going to be practically difficult for reasons you've seen.

Thanks!

@syclik syclik merged commit 11d5b7b into stan-dev:develop Aug 3, 2018
@roualdes
Copy link
Collaborator Author

roualdes commented Aug 3, 2018

Thanks all for your help on this. I especially appreciate your patience with me as I learn and gain experience in Stan and C++11.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Aug 4, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants